Pokazujemy EDA tych zbiorów, które mamy na obecnym etapie. Nie chcemy dopinać analiz jednak jeszcze na ostatni guzik, bo zarówno z tego co zaprezentujemy jak i po przekonsultowaniu tego, prawdopodobnie zbiorek będzie ewoluował.
Rozpoczęliśmy zbieranie danych z twittera, początkowe walki z ich API, ale i limity nałożone na dostęp do danych mocno by ograniczyły możliowści analiz Twittera pod kątem tematu naszego projektu.
Na szczęście z pomocą pojawia się narzędzie snscrape pod licencją GNU https://github.com/JustAnotherArchivist/snscrape. Oferuje on sporą elastyczność i możliwości w scrapowaniu tweetów. Spośród wszystkich atrybutów jakie pojedyńczy tweet ma, decydujemy się na archiwizację: 'Datetime', 'Tweet Id','Text','Username','Replies Count','Retweets Count','Likes Count','Quotes Count', 'Language','Retweeted Tweet','Quoted Tweet','Mentioned Users', jako najbardziej przydatnych do dalszych analiz.
Obecnie najczęsciej filtrowaliśmy po keywordach, dacie, autorze oraz języku tweeta. Poniżej zestaw zbiór i jego query jako mały insight czego można się spodziewać:
DATA:
Analiza będzie podzielona na 3 notebooki. Ten będzie poświęcony większym zbiorkom i danym w całości. Dwa odrębne do zbioru z rosyjskimi tweetami oraz jeden traktujący sankcje.
Zatem możemy zauważyć kilka większych zbiorów:
Middle_of_2021 mający umożliwić dokonywanie porównań z czasami przed kompletnym zaostrzeniem się konfliktu.
Before_war jest jeszcze sprzed inwazji, jednakże pamiętajmy, że jest to już z intensywnego gromadzenia się wojsk na granicy.
Oraz 2 zbiory z pierwszego dnia inwazji jeden w języku angielskim, drugi po rosyjsku.
Mamy także szereg zbiorków, który dotyczy najciekawszych naszym zdaniem sankcji nakładanych kolejno na państwo rosyjskie. Natomiast pierwszy Bucha Genocide ma zbierać tweety o masakrze dokonanej na ludności cywlinej na początku kwietnia w miejscowości Bucha.
import spacy
import pandas as pd
from tqdm.auto import tqdm
import swifter
import plotly.express as px
from wordcloud import WordCloud
from matplotlib import pyplot as plt
import textacy
from collections import Counter
import random
import os
import pickle
from pathlib import Path
import ast
pd.options.plotting.backend = "plotly"
random.seed(123)
en = spacy.load("en_core_web_sm")
def cloud_from_lemmas(word_counts):
wc = WordCloud(width=800, height=400)
wc.generate_from_frequencies(frequencies=word_counts)
plt.figure(figsize=(10,8))
plt.imshow(wc)
def plot_counts(counts):
fig = px.bar(counts,orientation='h', y='word', x='count')
fig['layout']['yaxis']['autorange'] = "reversed"
fig.update_layout(bargap=0.30, font={'size':10})
return fig
Zaczniemy od analizy tweetów wrzucanych przez MEPsów
df_MEPs = pd.read_csv("./data/twitter_MEPs_2k_2y.csv")
df_MEPs = df_MEPs.loc[df_MEPs['Language'] == 'en']#gdyby cos zle zostalo sklasyfikowane przez scraper, co czasem sie zdarza
df_MEPs
| Unnamed: 0 | Datetime | Tweet Id | Text | Username | Replies Count | Retweets Count | Likes Count | Quotes Count | Language | Retweeted Tweet | Quoted Tweet | Mentioned Users | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2022-04-08 06:00:00+00:00 | 1512309265641480192 | If I knew someone needed help, I would help th... | SCHIEDER | 0 | 0 | 4 | 0 | en | NaN | NaN | ['ZiobroPL'] |
| 1 | 1 | 2022-04-07 15:55:58+00:00 | 1512096856410574851 | EU, wake up! Don't leave West Balkans to Russi... | SCHIEDER | 0 | 0 | 5 | 0 | en | NaN | NaN | NaN |
| 2 | 2 | 2022-04-07 10:32:25+00:00 | 1512015431866978305 | Read here more how Europe must not disappoint ... | SCHIEDER | 0 | 0 | 1 | 0 | en | NaN | NaN | NaN |
| 3 | 3 | 2022-04-07 07:20:02+00:00 | 1511967017779224577 | Thank you for the good exchange about the curr... | SCHIEDER | 0 | 4 | 26 | 1 | en | NaN | NaN | ['anlifirat', 'bediaozgokce'] |
| 4 | 4 | 2022-04-06 12:29:20+00:00 | 1511682465362059269 | EU needs to be more active to end this crisis,... | SCHIEDER | 1 | 0 | 3 | 0 | en | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 124170 | 124170 | 2020-03-19 07:27:22+00:00 | 1240540351418564614 | There a no limits. https://t.co/w5VQk4TovP | tomastobe | 0 | 0 | 5 | 0 | en | NaN | https://twitter.com/Lagarde/status/12404149189... | NaN |
| 124171 | 124171 | 2020-03-08 08:20:55+00:00 | 1236567560558084097 | Today is International Women's Day.\nAs Chair ... | tomastobe | 1 | 1 | 10 | 0 | en | NaN | NaN | NaN |
| 124172 | 124172 | 2020-03-06 15:29:25+00:00 | 1235950619279716353 | @Kribberg Tack! | tomastobe | 0 | 0 | 0 | 0 | en | NaN | NaN | ['Kribberg'] |
| 124173 | 124173 | 2020-02-27 09:54:30+00:00 | 1232967231732146176 | EU development policy needs a fresh start. Dev... | tomastobe | 0 | 8 | 14 | 1 | en | NaN | NaN | ['euobs'] |
| 124174 | 124174 | 2020-02-26 13:26:16+00:00 | 1232658139364941824 | I strongly condemn that 1/3 of Poland is now d... | tomastobe | 12 | 23 | 110 | 0 | en | NaN | NaN | NaN |
121796 rows × 13 columns
MEPs_dict = Path('meps_string.txt').read_text().replace('\n', '')
MEPs_dict = ast.literal_eval(MEPs_dict)
MEPs_dict
{'MEPsAustria': ['SCHIEDER',
'AngelikaWinzig',
'thalerbarbara',
'VollathBettina',
'dieGamon',
'Evelyn_Regner',
'georgmayermep',
'HannesHeide',
'vilimsky',
'lukasmandl',
'MonikaVana',
'othmar_karas',
'thomaswaitz'],
'MEPsBelgium': ['Assita_Kanko',
'BenoitLutgen',
'FranssenCindy',
'Frederiqueries',
'GeertBourgeois',
'gannemans',
'guyverhofstadt',
'hildevautmans',
'jvanovertveldt',
'kvanbrempt',
'BotengaM',
'marctarabella',
'Mariearenaps',
'OChastel',
'pascal_arimont',
'ph_lamberts',
'msaraswati',
'saskiabricmont',
'TomVandendriese',
'tomvdkendelaere'],
'MEPsBulgaria': ['andreykovatchev',
'AndreyNovakov',
'AndreySlabakov',
'djambazki',
'AdemovAsim',
'AtidzheV',
'ElenaYoncheva',
'Emil_Radev',
'EvaMaydell',
'ilhankyuchyuk',
'Iskra_Mihaylova',
'IvoHristovMEP',
'PetarVitanovMEP',
'rmkanev',
'SergeiStanishev',
'tsvetypenkova'],
'MEPsCroatia': ['BiljanaBorzan',
'IvanVilibor',
'KarloRessler',
'ladislav_ilcic',
'mislavkolakusic',
'fred_matic',
'JerkovicRomana',
'SuncanaGlavak',
'TomislavSokol',
'TPicula',
'ValterFlego',
'ZovkoEU'],
'MEPsCyprus': ['MavridesCostas',
'DemPapadakis',
'LefChristoforou',
'loucas_fourlas',
'NKizilyurek'],
'MEPsCzechia': ['AlexandrVondra',
'charanzova',
'EvzenTosenovsky',
'IvanDav73523299',
'ZahradilJan',
'Pospisil_Jiri',
'Konecna_K',
'LudekNie',
'PiratKolaja',
'MarketkaG',
'Mdlabajova',
'msojdrova',
'vonpecka',
'KnotekOndrej',
'OKovarikMEP',
'radkamaxova',
'stanislavpolcak',
'TomasZdechovsky',
'vrecionova'],
'MEPsDenmark': ['AsgerChristens2',
'SchaldemoseMEP',
'karmel80',
'Kira_MPH',
'linealidell',
'MargreteAuken',
'MarianneVind',
'Loekkegaard_MEP',
'mortenhelveg',
'NielsFuglsang',
'nvillumsen',
'WeissPernille',
'KofodPeter',
'SoerenGade'],
'MEPsEstonia': ['Ansip_EU',
'JaakMadison',
'MarinaKaljurand',
'RihoTerras',
'svenmikser',
'Urmaspaet',
'YanaToom'],
'MEPsFinland': ['alviinaalametsa',
'EeroHeinaluoma',
'ElsiKatainen',
'HeidiHautala',
'HennaVirkkunen',
'LauraHuhtasaari',
'PekkarinenMauri',
'miapetrakumpula',
'NilsTorvalds',
'petrisarvamaa',
'silviamodig',
'spietikainen',
'VilleNiinisto'],
'MEPsFrance': ['AgnesEvren',
'AndreRougeOff',
'ASanderMEP',
'ASPelletier',
'Bruna_Annika',
'ArnaudDanjean',
'AureliaBeigneux',
'AuroreLalucq',
'BenoitBiteau',
'guetta_en',
'BriceHortefeux',
'CarolineRooseEU',
'CathChabaud',
'GrisetCatherine',
'GrudlerCh',
'CZacharopoulou',
'gruffat_claude',
'DamienCAREME',
'DavidCormand',
'DominiqueBilde',
'DominiqueRiquet',
'emmanuelmaurel',
'ericandrieueu',
'fabienne_keller',
'FranceJamet',
'F_Alfonsi',
'fxbellamy',
'GeoffroyDidier',
'GilbertCollard',
'GillesBoyer',
'Gilles_Lebreton',
'GDelbosCorfield',
'HeleneLaporteRN',
'HerveJuvin',
'ilanacicurelrem',
'ITolleret',
'JFJalkh',
'jllacapelle',
'JPGarraud',
'JDecerle',
'jerome_riviere',
'JoelleMelinRN',
'J_Bardella',
'JLechanteux',
'KarimaDelli',
'laurencefarreng',
'leilachaibi',
'ManonAubryFr',
'mbompard',
'marietouss1',
'MariePierreV',
'MAndrouet',
'MaxettePirbakas',
'MicheleRivasi',
'MounirSatouri',
'nadine__morano',
'ncolin_oesterle',
'NathalieLoiseau',
'NicolasBay_',
'MebarekNora',
'pcanfin',
'PDurandOfficiel',
'PhOlivierRN',
'Pierre_Ka',
'larrouturou',
'rglucks1',
'salima_yenbou',
'sandrogozi',
'StephaneBIJOUX',
'steph_sejourne',
's_yoncourtin',
'syl_brunet',
'sylvieguillaume',
'ThierryMARIANI',
'valeriehayer',
'VTrillet_Lenoir',
'v_joron',
'yjadot',
'younousomarjee'],
'MEPsGermany': ['AlexandraGeese',
'Andi_Glueck',
'Andreas_Schwab',
'ANiebler',
'anna_cavazzini',
'AnnaDeparnay',
'AxelVossMdEP',
'berndlange',
'BernhardZimniok',
'BirgitSippelMEP',
'Doleschal',
'AndersonAfDMdEP',
'ChSchneider',
'ConstanzeKrehl',
'ErnstCornelia',
'd_boeselager',
'caspary',
'daniel_freund',
'davidmcallister',
'delarabur',
'RadtkeMdEP',
'EnginEroglu_FW',
'ErikMarquardt',
'egebhardtMdEP',
'gabischoff',
'GuidoReil',
'gunnar_beck',
'HNeumannMEP',
'HelmutScholzMEP',
'henrikehahn',
'hildebentele',
'IsmailErtug',
'jcoetjen',
'EuropaJens',
'Joachim_Kuhs',
'Schuster_MdEP',
'Joerg_Meuthen',
'JuttaPaulusRLP',
'kbr_europa',
'karstenlucke',
'katarinabarley',
'k_langensiepen',
'LarsPatrickBerg',
'lenaduepontmdep',
'ManfredWeber',
'RipaManuela',
'MariaNoichl',
'MarionWalsmann',
'BuchheitMarkus',
'MarkusFerber',
'markuspieperMEP',
'BuschmannMartin',
'MartinHaeusling',
'schirdewan',
'MartinSonneborn',
'martina_michels',
'KrahMax',
'micha_bloss',
'gahler_michael',
'MHohlmeier',
'moritzkoerner',
'nicosemsrott',
'nicolabeerfdp',
'Nicolaus_Fest',
'nnienass',
'linsnorbert',
'OezlemADemirel',
'echo_pbreyer',
'peter_jahr',
'peterliese',
'DrPierrette',
'RasmusAndresen',
'bueti',
'RomeoFranz1',
'sabineverheyen',
'SLagodinsky',
'SkaKeller',
'DrStefanBerger',
'svensimon',
'svenja_hahn',
'Sylvia_Limmer',
'TerryReintke',
'woelken',
'UdoBullmann',
'UliMuellerMdEP',
'ViolavonCramon'],
'MEPsGreece': ['AlexisMep',
'AnnaAsimakopoul',
'papadimoulis',
'ElenaKountoura',
'vozemberg',
'e_fragkos',
'EvaKaili',
'GiorgosKyrtsos',
'iwlagos',
'M_Kefalogiannis',
'MariaSpyraki',
'androulakisnick',
'PetrosKokkalis',
'SteliosKoul',
'Kympouropoulos',
'v_meimarakis'],
'MEPsHungary': ['adamkosamep',
'AndorDeli',
'AndreaBocskor',
'donath_anna',
'AraKovacs',
'BalazsHidveghi',
'drcsabamolnar',
'toth_edina',
'GyoriEniko',
'HolvenyiGyorgy',
'istvan_ujhelyi',
'katka_cseh',
'DobrevKlara',
'trocsanyi',
'JarokaLivia',
'GyongyosiMarton',
'sandor_ronai',
'dajcstomi'],
'MEPsIreland': ['BarryAndrewsMEP',
'BillyKelleherEU',
'macmanuschris',
'CiaranCuffe',
'ClareDalyMEP',
'ColmMarkey',
'deirdreclunemep',
'FitzgeraldFrncs',
'GraceOSllvn',
'lukeming',
'MariaWalshEU',
'wallacemick',
'SeanKellyMEP'],
'MEPsItaly': ['PatricielloAldo',
'alebassoMEP',
'ale_moretti',
'alepanzaoff',
'caroppo_andrea',
'cozzolino62',
'AngeloCiocca',
'bonfriscoanna',
'TardinoAnnalisa',
'Rinaldi_euro',
'Antonio_Tajani',
'brandobenifei',
'CarloCalenda',
'FidanzaCarlo',
'CaterinaChinnic',
'chgemma68',
'Dani_Rondinelli',
'DOscarLancini',
'DinoGiarrusso',
'elenalizzi',
'EleonoraEvi',
'gualminielisa',
'FMCastaldo',
'ladyonorato',
'francorobertieu',
'fulviomartuscie',
'GianantonioDaRe',
'giannagancia',
'giulianopisapia',
'giosiferrandino',
'giuseppemilaz11',
'HerbertDorfmann',
'ignaziocorrao',
'itinagli',
'Isa_Adinolfi',
'isatovaglieri',
'LFerraraM5S',
'LuciaVuoloEU',
'luisaregimenti',
'marabizzotto',
'MCampomenosi',
'MarcoDreosto',
'Marcozanni86',
'MarcoZullo',
'MarioFurore',
'MaxSalini',
'maxsmeriglio',
'MassimoCasanov3',
'adinolfi_matteo',
'DantiNicola',
'NProcaccini',
'PaoloBorchia',
'paolodecastro',
'toiapatrizia',
'pfmajorino',
'PediciniM5S',
'bartolopietro1',
'FiocchiPietro',
'pinapic',
'RaffaeleFitto',
'r_stancanelli',
'rosadamato634',
'Rosannaconte_',
'SabriPignedoli',
'salv_de_meo',
'SERGIOBERLATO',
'SardoneSilvia',
'berlusconi',
'SR_Baldassarre',
'simonabonafe',
'ZambelliLega',
'SusannaCeccardi',
'beghin_t',
'ValentinoGrant',
'vincesofo'],
'MEPsLatvia': ['AndrisAmeriks',
'IneseVaidere',
'ijabs',
'nilsusakovs',
'robertszile',
'Kalniete',
'Tatjana_Zdanoka'],
'MEPsLithuania': ['AndriusKubilius',
'maldeikiene',
'BronisRopeLT',
'juozas_olekas',
'petras_petras',
'RJukneviciene',
'ViktorUspaskich'],
'MEPsLuxembourg': ['CharlesGoerens',
'CHansenEU',
'iwiseler',
'marcangel_lu',
'MonicaSemedoLux',
'MetzTilly'],
'MEPsMalta': ['alexagiussaliba',
'SantAlfred',
'engerer',
'DavidCasaMEP',
'josiannecutajar',
'RobertaMetsola'],
'MEPsNetherlands': ['a_jongerius',
'anjahazekamp',
'AnnieSchreijer',
'ToineMandersEP',
'bgroothuis',
'BasEickhout',
'hjaruissen',
'C_Nagtegaal',
'rookmakerdorien',
'Esther_de_Lange',
'jhuitema',
'jeroen_lenaers',
'kimvsparrentak',
'larawoltersEU',
'MalikAzmani',
'MJRLdeGraaff',
'mphoogeveen',
'MChahim',
'paultang',
'petervdalen',
'rooken',
'Rob_Roos',
'samiraraf',
'SophieintVeld',
'thijsreuten',
'Tineke_Strik',
'tbwberendsen',
'Vera_Tax'],
'MEPsPoland': ['AdamBielan',
'JarubasAdam',
'AndrzejHalicki',
'AnnaFotyga_PE',
'AnnaZalewskaMEP',
'Arlukowicz',
'BeataKempa_MEP',
'beatamk',
'BeataSzydlo',
'Bogdan_Rzonca',
'BLiberadzki',
'danutahuebner',
'd_tarczynski',
'elukacijewska',
'E_Rafalska',
'EwaKopacz',
'tobiszowski',
'IzabelaKloc',
'JSaryuszWolski',
'j_wisniewska',
'JanOlbrycht',
'JaninaOchojska',
'J_Lewandowski',
'JaroslawDuda',
'JerzyBuzek',
'jbrudzinski',
'j_kopcinska',
'profKarski',
'KosmaZlotowski',
'Hetman_K',
'LeszekMiller',
'LukaszKohut',
'Adamowicz_Magda',
'profMarekBelka',
'PoselBalt',
'PatrykJaki',
'radeksikorski',
'RobertBiedron',
'rozathun',
'r_czarnecki',
'SylwiaSpurek',
'TFrankowski21',
'TomaszPoreba',
'WaszczykowskiW',
'ZbigniewKuzmiuk',
'ZdzKrasnodebski'],
'MEPsPortugal': ['alvaroamaroEU',
'czorrinho',
'cmonteiroaguiar',
'FGuerreiroMEP',
'IECarvalhais',
'isabel_mep',
'JPimentaLopes',
'joseggusmao',
'JMFernandesEU',
'lidiafopereira',
'MPizarroPorto',
'mmargmarques',
'mgracacarvalho',
'leitaomarquesep',
'mmatias_',
'NunoMeloCDS',
'PauloRangel_pt',
'PedroMarquesMEP',
'sara_saracerdas'],
'MEPsRomania': ['AlinMituta',
'carmenavram',
'CorinaCretuEU',
'CristianTerhes',
'CristianSBusoi',
'CiolosDacian',
'MEPDanielBuda',
'dragos_pislaru',
'IoanDragosT',
'eugen_tomac',
'VladGheorgheNi1',
'IuliuWinkler',
'vinczelorant',
'MariaGrapini',
'MarianMarinescu',
'nicustefanuta',
'RamonaStrugariu',
'RovanaPlumbMM',
'SMuresan',
'tbasescu',
'negrescuvictor',
'VladDanGheorghe',
'botos_vlad'],
'MEPsSlovakia': ['EugenJurzyca',
'IvanStefanec',
'LNicholsonova',
'mhojsik',
'MSimecka',
'MichalWiezik',
'MilanUhrik',
'MiriamMLex',
'radacovskyMEP',
'monika_benova',
'Peter_Pollak',
'RobertHajsel',
'VladoBilcik'],
'MEPsSlovenia': ['Franc_Bogovic',
'IJoveva',
'KGroselj',
'LjudmilaNovak',
'milan_brglez',
'MilanZver',
'RomanaTomc',
'tfajon'],
'MEPsSpain': ['adrianvl1982',
'_AMaldonado_',
'aliciahoms',
'toni_comin',
'TonoEPP',
'KRLS',
'cesarluena',
'ClaraAguilera7',
'claraponsati',
'crismaestre',
'DianaRibaGiner',
'DolorsMM',
'DomenecD',
'EGardiazabal',
'ernesturtasun',
'gonzalezpons',
'estrella_dura',
'MEugeniaRPalop',
'PacoMillanMon',
'GabrielMatoA',
'hermanntertsch',
'Ibangarciadb',
'IdoiaVR',
'RodriguezPinero',
'IratxeGarper',
'IsabelBenjumea',
'isabeldemuel',
'IzaskunBilbaoB',
'javilopezEU',
'J_MorenoSanchez',
'FZarzalejos',
'jonasfernandez',
'jordi_canyas',
'jordisolef',
'Jorgebuxade',
'MargalloJm',
'JRBauza',
'JFLopezAguilar',
'zoidoJI',
'LeopoldoLopezG',
'linagalvezmunoz',
'lugaricano',
'maitepagaza',
'Manu_Abu_Carlos',
'mrossemp',
'MargaPisa',
'sorayarr_',
'MazalyAguilar',
'MiguelUrban',
'MonicaSilvanaG',
'NachoSAmor',
'nicogoncas',
'pernandobarrena',
'delcastillop',
'sirarego',
'susanasolisp'],
'MEPsSweden': ['AbirAlsahlani',
'ArbaKokalari',
'weimers',
'DavidLega',
'emmawiesner',
'erik_bergkvist',
'EvinIncir',
'ilandebasso',
'jakopdalunde',
'jessicapolfjard',
'JessicaStegrud',
'jorgenwarborn',
'JytteGuteland',
'KarinKarlsbro',
'MalinBjork_EU',
'ParHolmgren',
'skyttedal',
'tomastobe']}
df_MEPs.drop(columns=['Unnamed: 0', 'Language'], inplace = True)
Na tak dużym wszystko wykonywało się bardzo długo, ograniczmy sie do czasu zaostrzenia konfliktu.
df_MEPs = df_MEPs[(df_MEPs['Datetime'] > '2022-02-24')]
df_MEPs["Text_en"] = df_MEPs['Text'].swifter.apply(en)
C:\Users\jakub\AppData\Local\Temp/ipykernel_1476/1443730082.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_MEPs["Text_en"] = df_MEPs['Text'].swifter.apply(en)
df_MEPs.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 11012 entries, 0 to 124045 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Datetime 11012 non-null object 1 Tweet Id 11012 non-null int64 2 Text 11012 non-null object 3 Username 11012 non-null object 4 Replies Count 11012 non-null int64 5 Retweets Count 11012 non-null int64 6 Likes Count 11012 non-null int64 7 Quotes Count 11012 non-null int64 8 Retweeted Tweet 0 non-null float64 9 Quoted Tweet 2666 non-null object 10 Mentioned Users 5300 non-null object 11 Text_en 11012 non-null object dtypes: float64(1), int64(5), object(6) memory usage: 1.3+ MB
doc_lens = df_MEPs["Text_en"].str.len()
doc_lens.hist(log_y=True)
fig, ax = plt.subplots(figsize=(19, 13))
ax.boxplot(doc_lens)
plt.show()
Przez bardzo długi czas wykonywania się zapytań, ograniczmy się do tweetów z czasów wojny.
not_interesting = set(["\n", "\n\n", "🇺", "🇦", " ", "", '🇷', '👇', 'amp'])
lemmas = df_MEPs.Text_en.apply(lambda doc: [token.lemma_ for token in doc if not token.is_stop if not token.is_punct if not token.lemma_ in not_interesting])
word_counts = Counter(lemmas.sum())
counts = pd.DataFrame(Counter({k: v for k, v in word_counts.items()}).most_common(60), columns=['word', 'count'])
cloud_from_lemmas(word_counts)
plot_counts(counts)
not_interesting = {'the', '@', 'a', 'this'}
lemmas_ngrams = df_MEPs.Text_en.apply(lambda doc: list(doc.noun_chunks))
lemmas_ngrams = lemmas_ngrams.apply(lambda x: [''.join(str(el)) for el in x if len(el) == 2])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if len(x.split())==2])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if x.split()[0].lower() not in not_interesting if x.split()[1].lower() not in not_interesting])
word_counts_ngrams = dict(Counter(lemmas_ngrams.sum()).most_common(30))
cloud_from_lemmas(word_counts_ngrams)
counts_ngrams = pd.DataFrame(Counter({k: v for k, v in word_counts_ngrams.items()}).most_common(60), columns=['word', 'count'])
plot_counts(counts_ngrams)
Pojawiające się zbitki słów wydają się bardzo właściwe dla naszego tematu. Pokazują, że niesienie wsparcia jest istotne oraz jasno wskazywany jest agresor w ogólnej narracji. Niemniej jednak w debacie publicznej wyraźnie pokazuje się znaczenie Rosji w przemyśle energetycznym.
not_interesting = {'the', '@', 'a', 'this'}
lemmas_ngrams = df_MEPs.Text_en.apply(lambda doc: list(doc.noun_chunks))
lemmas_ngrams = lemmas_ngrams.apply(lambda x: [''.join(str(el)) for el in x if len(el) == 3])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if len(x.split())==3])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if x.split()[0].lower() not in not_interesting if x.split()[1].lower() not in not_interesting if x.split()[2].lower() not in not_interesting])
word_counts_ngrams = dict(Counter(lemmas_ngrams.sum()).most_common(30))
cloud_from_lemmas(word_counts_ngrams)
counts_ngrams = pd.DataFrame(Counter({k: v for k, v in word_counts_ngrams.items()}).most_common(60), columns=['word', 'count'])
plot_counts(counts_ngrams)
for x in df_MEPs['Text']:
if 'your devilish way' in x:
print(x+'\n')
@mfa_russia @NATO Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #PutinWarCriminal @RussianEmbassy @mfa_russia @RusembUkraine @RusEmbUSA @RusMission_EU @BBCWorld @SkyNews @ftworldnews @guardianworld @TelegraphWorld @spectator 200% #fakenews and total lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Lavrov #Putin @KremlinRussia 🇷🇺 @RussianEmbassy @mfa_russia @RusembUkraine @AmbRusFrance @RusBotschaft @RusMission_EU @BBCWorld @SkyNews @ftworldnews @TelegraphWorld @guardianworld We in 🇳🇱 know about 🇷🇺 lies because of #MH17 .And here, again; lies => every day, every hour, every minute, every second. Repent, and turn from your devilish way #Putin #Lavrov @KremlinRussia @RussianEmbassy @mfa_russia @RusembUkraine @RusMission_EU @RusEmbUSA @RussiaUN @BBCWorld @REESOxford @SkyNews @ftworldnews @guardianworld Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Putin @KremlinRussia 🇷🇺 @RussianEmbassy @mfa_russia @RusEmbUSA @RusembUkraine @BBCWorld @SkyNews @spectator @NewStatesman @standardnews @REESOxford @RT_com Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Putin #Lavrov @KremlinRussia 🇷🇺 @RussianEmbassy @mfa_russia @RusMission_EU @mission_russian @RusEmbUSA @BBCWorld @SkyNews @ftworldnews @guardianworld @TelegraphWorld @REESOxford Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Lavrov @KremlinRussia 🇷🇺 @RussianEmbassy @mfa_russia @RusembUkraine @RussiaUN @BBCWorld @SkyNews @FT @TelegraphWorld @guardianworld @spectator @NewStatesman Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Putin @KremlinRussia 🇷🇺 @RussianEmbassy @mfa_russia @RusembUkraine @RussiaUN @BBCWorld @SkyNews @ftworldnews @TelegraphWorld @guardianworld @spectator @NewStatesman Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Putin @KremlinRussia_E 🇷🇺 @RussianEmbassy Contrary to many houses in 🇺🇦 ,this house in 🇷🇺 still stands! Contrary to many men, women and children being bombed and killed by #Putin in 🇺🇦 ,people here can go in and out as they wish. Repent @KremlinRussia and turn from your devilish way @RussianEmbassy @mfa_russia @RusembUkraine @RF_OSCE @mission_russian @BBCWorld @SkyNews @guardian @Telegraph @ftworldnews @REESOxford Lies; every day, every hour, every minute, every second! Repent, and turn from your devilish way #Putin @KremlinRussia 🇷🇺 #Lavrov @RussianEmbassy @mfa_russia @RussiaUN @BBCWorld @SkyNews @ftworldnews @TelegraphWorld @guardianworld @TheEconomist @spectator @business Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Putin @KremlinRussia_E Rusland 🇷🇺 @RussianEmbassy @RussiaUN @mfa_russia @RusembUkraine @BBCWorld @SkyNews @ftworldnews @TelegraphWorld @guardianworld @spectator @REESOxford Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Putin @KremlinRussia_E 🇷🇺 @RussianEmbassy @KremlinRussia_E @mfa_russia @BBCWorld @SkyNews @ftworldnews @TelegraphWorld @guardianworld @spectator @NewStatesman @RT_com Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Putin @KremlinRussia_E 🇷🇺 @RussianEmbassy @guardian @RussiaUN @mfa_russia @mod_russia @RusembUkraine @rusembassynl @RT_com @RusEmbUSA @SkyNews @BBCWorld 200% fake news. Lies: every day, every hour, every minute, every second. Repent, and turn from your devilish way #Putin @KremlinRussia_E 🇷🇺 @RussianEmbassy @KremlinRussia_E @mfa_russia @BBCWorld @SkyNews @ftworldnews @guardianworld @TelegraphWorld @GBNEWS @MailOnline @DailyMirror Lies; every day, every hour, every minute, every second! Repent, and turn from your devilish way #Putin @KremlinRussia_E 🇷🇺 @RussianEmbassy @mfa_russia @RussiaUN @mission_russian @RusEmbUSA @natomission_ru @BBCWorld @REESOxford @guardianworld @spectator @NewStatesman Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Lavrov #Putin @KremlinRussia_E 🇷🇺 @RussianEmbassy @nytimes @mfa_russia @RusEmbUSA @natomission_ru @BBCWorld @FinancialTimes @guardian @Telegraph @RT_com @REESOxford I prefer a cold war instead of a hot Russian war with the bombing and killing of innocent people in 🇺🇦. Stop your lies and repent, and turn from your devilish way #Putin @KremlinRussia_E @RussianEmbassy @mfa_russia @RusembUkraine @RusEmbUSA @BBCWorld @TelegraphWorld @guardianworld @spectator @NewStatesman @REESOxford @standardnews Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way #Lavrov 🇷🇺 @KremlinRussia @RussianEmbassy @mfa_russia @RusEmbUSA @RusMission_EU @BBCWorld @ftworldnews @TelegraphWorld @guardianworld @TheEconomist @business @standardnews Allies such as North Korea, Belarus, Eritrea, Syria?! The scum of the world is gathering with 🇷🇺 #Russia #Lavrov @KremlinRussia .Repent, and turn from your devilish way @RussianEmbassy Mir & pravda?? Lies; every day, every hour, every minute, every second. Repent, and turn from your devilish way @RussianEmbassy @mfa_russia @RussiaUN @RusMission_EU @FCDOGovUK @10DowningStreet @BBCWorld @bbcrussian @FT @thesundaytimes @Telegraph Lies; every day, every hour, every minute, every second. Repent and turn from your devilish way @RussianEmbassy @mfa_russia @mod_russia @FCDOGovUK @10DowningStreet @RussiaUN @EmbassyofRussia @RF_OSCE @RusMission_EU @BBCWorld @CGTNOfficial Lies by #Putin @KremlinRussia_E every day, every hour, every minute, every second! Repent, and turn from your devilish way @RussianEmbassy @KremlinRussia_E @mfa_russia @RusEmbUSA @RusMission_EU @BBCWorld @SkyNews @GBNEWS @FinancialTimes @guardian @Telegraph THX! The best joke 😭 since ages! Inequality?! Ever heard of Russian 🇷🇺 oligarchs who own many houses and super yachts worth up to hundreds of millions of euros?! Repent, and turn from your devilish way @RussianEmbassy @KremlinRussia_E @mfa_russia @BBCWorld @SkyNews @FinancialTimes @TheEconomist @business @RusEmbUSA @RusMission_EU @RT_com What have you stolen #Putin in 🇺🇦?! Lives; many lives, lives of men, women and children. Now everybody knows that innocent lives have been stolen by the brutal dictator @KremlinRussia_E . Repent, and turn from your devilish way! @RussianEmbassy @KremlinRussia_E @mfa_russia @RussiaUN @RusEmbUSA @RusMission_EU @mission_russian @BBCWorld @SkyNews @ftworldnews @guardianworld Lies and killing by #Putin every day, every hour, every minute, every second. Repent @KremlinRussia_E and turn from your devilish way @RussianEmbassy The church that supports dictator Putin @KremlinRussia_E ?! Repent and turn from your devilish way! @RussianEmbassy @mfa_russia @RusembUkraine @BBCWorld @ftworldnews @guardianworld @TelegraphWorld @standardnews @MailOnline @Daily_Express @REESOxford Just lies from 🇷🇺 and @Kermlin_e .Every day, every hour, every minute, every second: only lies. Repent and turn from your devilish way that kills innocent people from 🇺🇦 @RussianEmbassy @UN @RussiaUN @mfa_russia @RusembUkraine @BBCWorld @guardianworld @ftworldnews @TelegraphWorld @standardnews @MailOnline @Daily_Express You are complete fools who lie every day, every hour, every minute, every second! Repent and turn from your devilish way
Popularnym było umieszczanie tej samej frazy w wielu tweetach tworząc pewien trend. Wyjaśnia to tez popularność ngramów every minute, second etc. widocznych 2 wykresy wyżej.
top_10_likes = df_MEPs.sort_values(by=['Likes Count'], ascending=False).head(10).Text
for l in top_10_likes: print(l+'\n\n')
🇨🇦🇬🇧|Yesterday, Canada's Prime Minister @JustinTrudeau visited the #EU Parliament to give a speech. I took the opportunity to give him an appropriate "welcome" there. Short, concise and right hitting the bull's eye! #ID https://t.co/qpcQyGTixQ Your strategy of incremental sanctions doesn’t work. Cannot work… That’s why 212 members of Parliament demand a special #EUCO meeting to decide on full sanctions immediately! My speech👇🏻 https://t.co/MFCtmboaf4 THREAD 1/7 Intel from a Ukrainian officer about a meeting in Putin’s lair in Urals. Oligarchs convened there so no one would flee. Putin is furious, he thought that the whole war would be easy and everything would be done in 1-4 days. @EPPGroup @general_ben @edwardlucas @politico https://t.co/8AoelUDWM9 PM Trudeau, in recent months, under your quasi-liberal boot, Canada 🇨🇦 has become a symbol of civil rights violations. The methods we have witnessed may be liberal to you, but to many citizens around the🌎it seemed like a dictatorship of the worst kind. https://t.co/FZuc6aDZ1I BREAKING - European Parliament want immediate and full fossil fuel embargo ! No gas, no oil, no coal… NO European money for Putin’s army ! 🛑 https://t.co/sh46e1ehiS Meanwhile in Afghanistan... tens of thousands seeking refuge; five million children facing famine; 500% increase in child marriages; children being sold to feed families... Not a mention of it. My god, they must be wondering what makes their humanitarian crisis so unimportant. https://t.co/VtDk5awWwk 7/7 The Ukrainians must avoid panic! The missile strikes are for intimidation, the Russians fire them at random to “accidentally” hit residential buildings to make the attack look larger than it really is. Ukraine must stay strong and we must provide assistance! #StandWithUkraine Again, I have received many messages from people in the UK apologising for their Prime Minister. Trust me: As much as I understand the sentiment, we never forgot that there are millions of people in the UK who disagree with him. The UK is much more than its current PM. 🇪🇺❤️🇬🇧 8/7 Spread this information so the world would realise how important it is to assist Ukraine right now and without hesitation! It is difficult for Russia, but it is difficult for Ukraine as well if the West does not provide meaningul support! @EPPGroup @MFAestonia @MoD_Estonia 6/7 Russia’s whole plan relies on panic – that the civilians and armed forces surrender and Zelensky flees. They expect Kharkiv to surrender first so the other cities would follow suit to avoid bloodshed. The Russians are in shock of the fierce resistance they have encountered.
Powyżej dosyć ciekawe zestawienie 10 najbardziej lajkowanych tweetów.
'Your strategy of incremental sanctions doesn’t work. Cannot work…'
co do związku niektorych z sytuacją na Ukrainie możemy być niemal pewni. Pamiętajmy jednak, że w zbiorze bedzię znajdywało się na chwiele obecną wiele tweetów MEPsów na tematy inne niż Ukraina. Filtrowanie zostało zaaplikowane jedynie po dacie, nie po keywordach.
df_MEPs
| Datetime | Tweet Id | Text | Username | Replies Count | Retweets Count | Likes Count | Quotes Count | Retweeted Tweet | Quoted Tweet | Mentioned Users | Text_en | country | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-04-08 06:00:00+00:00 | 1512309265641480192 | If I knew someone needed help, I would help th... | SCHIEDER | 0 | 0 | 4 | 0 | NaN | NaN | ['ZiobroPL'] | (If, I, knew, someone, needed, help, ,, I, wou... | MEPsAustria |
| 1 | 2022-04-07 15:55:58+00:00 | 1512096856410574851 | EU, wake up! Don't leave West Balkans to Russi... | SCHIEDER | 0 | 0 | 5 | 0 | NaN | NaN | NaN | (EU, ,, wake, up, !, Do, n't, leave, West, Bal... | MEPsAustria |
| 2 | 2022-04-07 10:32:25+00:00 | 1512015431866978305 | Read here more how Europe must not disappoint ... | SCHIEDER | 0 | 0 | 1 | 0 | NaN | NaN | NaN | (Read, here, more, how, Europe, must, not, dis... | MEPsAustria |
| 3 | 2022-04-07 07:20:02+00:00 | 1511967017779224577 | Thank you for the good exchange about the curr... | SCHIEDER | 0 | 4 | 26 | 1 | NaN | NaN | ['anlifirat', 'bediaozgokce'] | (Thank, you, for, the, good, exchange, about, ... | MEPsAustria |
| 4 | 2022-04-06 12:29:20+00:00 | 1511682465362059269 | EU needs to be more active to end this crisis,... | SCHIEDER | 1 | 0 | 3 | 0 | NaN | NaN | NaN | (EU, needs, to, be, more, active, to, end, thi... | MEPsAustria |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 124041 | 2022-02-27 11:02:16+00:00 | 1497889816649904130 | More than 360 000 Ukrainians now displaced in ... | tomastobe | 1 | 8 | 24 | 0 | NaN | NaN | NaN | (More, than, 360, 000, Ukrainians, now, displa... | None |
| 124042 | 2022-02-25 20:18:03+00:00 | 1497304908940333061 | Today Russia threatened Sweden 🇸🇪and Finland 🇫... | tomastobe | 7 | 74 | 503 | 1 | NaN | NaN | NaN | (Today, Russia, threatened, Sweden, 🇸, 🇪, and,... | None |
| 124043 | 2022-02-24 13:25:30+00:00 | 1496838698188611588 | Ukraine urgently needs medical aid items follo... | tomastobe | 2 | 6 | 19 | 0 | NaN | https://twitter.com/JanezLenarcic/status/14968... | ['JanezLenarcic'] | (Ukraine, urgently, needs, medical, aid, items... | None |
| 124044 | 2022-02-24 10:00:10+00:00 | 1496787027156627459 | The geopolitical realities in our neighbourhoo... | tomastobe | 0 | 1 | 11 | 0 | NaN | NaN | NaN | (The, geopolitical, realities, in, our, neighb... | None |
| 124045 | 2022-02-24 09:59:30+00:00 | 1496786857455149056 | We must now ensure safe passage and effective ... | tomastobe | 3 | 4 | 21 | 0 | NaN | NaN | NaN | (We, must, now, ensure, safe, passage, and, ef... | None |
11012 rows × 13 columns
df_MEPs['country'] = [None]*(11012)
C:\Users\jakub\AppData\Local\Temp/ipykernel_1476/2147309972.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
for i in df_MEPs.index:
try:
df_MEPs['country'][i] = [c_name for c_name, MEPs_names in MEPs_dict.items() if df_MEPs.iloc[[i]].Username.values[0] in MEPs_names][0]
except IndexError:
#df_MEPs.iloc[[i]].Country = None
continue
#df_MEPs.iloc[[i]].Country = [c_name for c_name, MEPs_names in MEPs_dict.items() if df_MEPs.iloc[[i]].Username.values[0] in MEPs_names][0]
C:\Users\jakub\AppData\Local\Temp/ipykernel_1476/2082221353.py:10: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_MEPs
| Datetime | Tweet Id | Text | Username | Replies Count | Retweets Count | Likes Count | Quotes Count | Retweeted Tweet | Quoted Tweet | Mentioned Users | Text_en | country | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-04-08 06:00:00+00:00 | 1512309265641480192 | If I knew someone needed help, I would help th... | SCHIEDER | 0 | 0 | 4 | 0 | NaN | NaN | ['ZiobroPL'] | (If, I, knew, someone, needed, help, ,, I, wou... | MEPsAustria |
| 1 | 2022-04-07 15:55:58+00:00 | 1512096856410574851 | EU, wake up! Don't leave West Balkans to Russi... | SCHIEDER | 0 | 0 | 5 | 0 | NaN | NaN | NaN | (EU, ,, wake, up, !, Do, n't, leave, West, Bal... | MEPsAustria |
| 2 | 2022-04-07 10:32:25+00:00 | 1512015431866978305 | Read here more how Europe must not disappoint ... | SCHIEDER | 0 | 0 | 1 | 0 | NaN | NaN | NaN | (Read, here, more, how, Europe, must, not, dis... | MEPsAustria |
| 3 | 2022-04-07 07:20:02+00:00 | 1511967017779224577 | Thank you for the good exchange about the curr... | SCHIEDER | 0 | 4 | 26 | 1 | NaN | NaN | ['anlifirat', 'bediaozgokce'] | (Thank, you, for, the, good, exchange, about, ... | MEPsAustria |
| 4 | 2022-04-06 12:29:20+00:00 | 1511682465362059269 | EU needs to be more active to end this crisis,... | SCHIEDER | 1 | 0 | 3 | 0 | NaN | NaN | NaN | (EU, needs, to, be, more, active, to, end, thi... | MEPsAustria |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 124041 | 2022-02-27 11:02:16+00:00 | 1497889816649904130 | More than 360 000 Ukrainians now displaced in ... | tomastobe | 1 | 8 | 24 | 0 | NaN | NaN | NaN | (More, than, 360, 000, Ukrainians, now, displa... | None |
| 124042 | 2022-02-25 20:18:03+00:00 | 1497304908940333061 | Today Russia threatened Sweden 🇸🇪and Finland 🇫... | tomastobe | 7 | 74 | 503 | 1 | NaN | NaN | NaN | (Today, Russia, threatened, Sweden, 🇸, 🇪, and,... | None |
| 124043 | 2022-02-24 13:25:30+00:00 | 1496838698188611588 | Ukraine urgently needs medical aid items follo... | tomastobe | 2 | 6 | 19 | 0 | NaN | https://twitter.com/JanezLenarcic/status/14968... | ['JanezLenarcic'] | (Ukraine, urgently, needs, medical, aid, items... | None |
| 124044 | 2022-02-24 10:00:10+00:00 | 1496787027156627459 | The geopolitical realities in our neighbourhoo... | tomastobe | 0 | 1 | 11 | 0 | NaN | NaN | NaN | (The, geopolitical, realities, in, our, neighb... | None |
| 124045 | 2022-02-24 09:59:30+00:00 | 1496786857455149056 | We must now ensure safe passage and effective ... | tomastobe | 3 | 4 | 21 | 0 | NaN | NaN | NaN | (We, must, now, ensure, safe, passage, and, ef... | None |
11012 rows × 13 columns
df_Germany = df_MEPs[(df_MEPs['country'] == 'MEPsGermany')]
df_Germany
| Datetime | Tweet Id | Text | Username | Replies Count | Retweets Count | Likes Count | Quotes Count | Retweeted Tweet | Quoted Tweet | Mentioned Users | Text_en | country | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1971 | 2022-02-24 18:31:00+00:00 | 1496915582565900296 | Indeed. https://t.co/kvB5dVATiN | Assita_Kanko | 3 | 4 | 22 | 0 | NaN | https://twitter.com/amanpour/status/1496915136... | NaN | (Indeed, ., https://t.co/kvB5dVATiN) | MEPsGermany |
| 1972 | 2022-02-24 18:30:36+00:00 | 1496915480858312709 | @amanpour @jensstoltenberg Indeed. | Assita_Kanko | 0 | 0 | 1 | 0 | NaN | NaN | ['amanpour', 'jensstoltenberg'] | (@amanpour, @jensstoltenberg, Indeed, .) | MEPsGermany |
| 1973 | 2022-02-24 17:25:27+00:00 | 1496899085663711238 | Living without war is a blessing we easily tak... | Assita_Kanko | 1 | 9 | 72 | 0 | NaN | NaN | NaN | (Living, without, war, is, a, blessing, we, ea... | MEPsGermany |
| 1974 | 2022-02-24 15:08:46+00:00 | 1496864685555130369 | "Many Ukrainians are ready to fight”, said my ... | Assita_Kanko | 0 | 5 | 13 | 1 | NaN | NaN | NaN | (", Many, Ukrainians, are, ready, to, fight, ”... | MEPsGermany |
| 1975 | 2022-02-24 12:48:41+00:00 | 1496829435739197440 | @paepeilse @HowcanImakethi1 Nope nothing to do... | Assita_Kanko | 4 | 0 | 1 | 0 | NaN | NaN | ['paepeilse', 'HowcanImakethi1'] | (@paepeilse, @HowcanImakethi1, Nope, nothing, ... | MEPsGermany |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4305 | 2022-02-24 12:58:29+00:00 | 1496831900031545353 | Tonight at 18:30/19:00👉Extraordinary @EP_Forei... | hildevautmans | 0 | 2 | 3 | 0 | NaN | NaN | ['EP_ForeignAff', 'EP_Defence', 'JosepBorrellF... | (Tonight, at, 18:30/19:00, 👉, Extraordinary, @... | MEPsGermany |
| 4306 | 2022-02-24 12:10:46+00:00 | 1496819893764825089 | Enough wake-up calls! The EU must urgently dou... | hildevautmans | 0 | 0 | 2 | 0 | NaN | NaN | NaN | (Enough, wake, -, up, calls, !, The, EU, must,... | MEPsGermany |
| 4307 | 2022-02-24 08:42:25+00:00 | 1496767459495497728 | War in Europe, simply because of illusions of ... | hildevautmans | 0 | 1 | 16 | 0 | NaN | NaN | NaN | (War, in, Europe, ,, simply, because, of, illu... | MEPsGermany |
| 4308 | 2022-02-24 07:56:50+00:00 | 1496755988266897408 | The European Parliament must and shall play it... | hildevautmans | 1 | 1 | 7 | 0 | NaN | NaN | ['EP_ForeignAff', 'EP_Defence', 'Europarl_EN'] | (The, European, Parliament, must, and, shall, ... | MEPsGermany |
| 4309 | 2022-02-24 07:16:51+00:00 | 1496745924755738624 | This is a sad moment in history. War in Europe... | hildevautmans | 10 | 44 | 189 | 2 | NaN | NaN | NaN | (This, is, a, sad, moment, in, history, ., War... | MEPsGermany |
250 rows × 13 columns
not_interesting = set(["\n", "\n\n", "🇺", "🇦", " ", "", '🇷', '👇', 'amp'])
lemmas_G = df_Germany.Text_en.apply(lambda doc: [token.lemma_ for token in doc if not token.is_stop if not token.is_punct if not token.lemma_ in not_interesting])
word_counts_G = Counter(lemmas_G.sum())
counts_G = pd.DataFrame(Counter({k: v for k, v in word_counts.items()}).most_common(60), columns=['word', 'count'])
cloud_from_lemmas(word_counts_G)
plot_counts(counts_G)
not_interesting = {'the', '@', 'a', 'this'}
lemmas_ngrams = df_Germany.Text_en.apply(lambda doc: list(doc.noun_chunks))
lemmas_ngrams = lemmas_ngrams.apply(lambda x: [''.join(str(el)) for el in x if len(el) == 2])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if len(x.split())==2])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if x.split()[0].lower() not in not_interesting if x.split()[1].lower() not in not_interesting])
word_counts_ngrams = dict(Counter(lemmas_ngrams.sum()).most_common(30))
cloud_from_lemmas(word_counts_ngrams)
counts_ngrams = pd.DataFrame(Counter({k: v for k, v in word_counts_ngrams.items()}).most_common(60), columns=['word', 'count'])
plot_counts(counts_ngrams)
not_interesting = {'the', '@', 'a', 'this'}
lemmas_ngrams = df_Germany.Text_en.apply(lambda doc: list(doc.noun_chunks))
lemmas_ngrams = lemmas_ngrams.apply(lambda x: [''.join(str(el)) for el in x if len(el) == 3])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if len(x.split())==3])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if x.split()[0].lower() not in not_interesting if x.split()[1].lower() not in not_interesting if x.split()[2].lower() not in not_interesting])
word_counts_ngrams = dict(Counter(lemmas_ngrams.sum()).most_common(30))
cloud_from_lemmas(word_counts_ngrams)
counts_ngrams = pd.DataFrame(Counter({k: v for k, v in word_counts_ngrams.items()}).most_common(60), columns=['word', 'count'])
plot_counts(counts_ngrams)#count tylko 1
top_15_likes = df_Germany.sort_values(by=['Likes Count'], ascending=False).head(10).Text
for l in top_15_likes: print(l+'\n\n')
Your strategy of incremental sanctions doesn’t work. Cannot work… That’s why 212 members of Parliament demand a special #EUCO meeting to decide on full sanctions immediately! My speech👇🏻 https://t.co/MFCtmboaf4 BREAKING - European Parliament want immediate and full fossil fuel embargo ! No gas, no oil, no coal… NO European money for Putin’s army ! 🛑 https://t.co/sh46e1ehiS Johnson's comparaison of the courageous fight of Ukraine with Brexit is insane… Brexit was about undoing freedoms and leaving the EU…Ukrainians want more freedom and to join the EU! https://t.co/DOtaNTpiWN Ukrainian people ask for EU membership, they reject the tyranny & violence of Putin's regime. Tomorrow, the European Parliament will ask that #Ukraine is declared a candidate country. 🇺🇦🇪🇺 #StandWithUkraine https://t.co/SoTn29OWmd After #Bucha how can European leaders justify continuing to buy Russian gas and finance Putin’s criminal war machine ? Full ban immediately ! https://t.co/DrwZnYFCA1 The yachts of oligarchs have been seized in France and Germany. London is paved with the bloody gold of Russian billionaires close to Putin's regime. When will Johnson act? #StandWithUkraine️ https://t.co/AfqI2DGWMy The idea of Brexit was to cut red tape..but Britain has become the red tape capital of the Europe! Britain is the only big country in Europe where Ukrainians need visas. British people want to help refugees,but the government is terrified of migration! https://t.co/9DJYU7nDxe Amidst the horror and the mass murder of civilians by Putin we need a complete EMBARGO on Russian gas and oil immediately & cut of ALL Russian banks from SWIFT ! https://t.co/i8rBtc3em0 President @ZelenskyyUa and his party Sluha Narodu are entering to the European liberal-democratic family ! A small but in current circumstances a highly significant act of recognition. 🇪🇺 https://t.co/EzopoJqDKF Macron wins first round… but the real battle has only begun. 2 weeks to keep Putin’s allies away from the Elysée ! 2 weeks to strengthen liberté, égalité, fraternité 🇫🇷 against authoritarianism and hatred ! ✊🏻 https://t.co/tatogytPDR
df_Germany = df_MEPs[(df_MEPs['country'] == 'MEPsPoland')] #pozostaje nazwa df!!
df_Germany
| Datetime | Tweet Id | Text | Username | Replies Count | Retweets Count | Likes Count | Quotes Count | Retweeted Tweet | Quoted Tweet | Mentioned Users | Text_en | country | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 8000 | 2022-03-17 20:18:24+00:00 | 1504552752843833346 | Unprecedented political revenge is incompatibl... | Emil_Radev | 6 | 2 | 4 | 1 | NaN | NaN | NaN | (Unprecedented, political, revenge, is, incomp... | MEPsPoland |
| 8008 | 2022-04-11 10:58:36+00:00 | 1513471574120316932 | Multilateralism and collective #security among... | EvaMaydell | 0 | 2 | 7 | 0 | NaN | https://twitter.com/thetimes/status/1513415540... | NaN | (Multilateralism, and, collective, #, security... | MEPsPoland |
| 8009 | 2022-04-08 16:39:36+00:00 | 1512470223793889289 | Russian #Cyberattacks and airspace violations ... | EvaMaydell | 1 | 16 | 28 | 1 | NaN | NaN | NaN | (Russian, #, Cyberattacks, and, airspace, viol... | MEPsPoland |
| 8010 | 2022-04-08 11:30:42+00:00 | 1512392488341934092 | The bombing of #Kramatorsk train station is ho... | EvaMaydell | 0 | 3 | 13 | 0 | NaN | https://twitter.com/Reuters/status/15123516882... | NaN | (The, bombing, of, #, Kramatorsk, train, stati... | MEPsPoland |
| 8011 | 2022-04-08 09:14:18+00:00 | 1512358161214328833 | Interesting read from @POLITICOEurope \n\nLook... | EvaMaydell | 0 | 1 | 3 | 0 | NaN | https://twitter.com/POLITICOEurope/status/1512... | ['POLITICOEurope'] | (Interesting, read, from, @POLITICOEurope, \n\... | MEPsPoland |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8137 | 2022-02-24 16:40:59+00:00 | 1496887894375600129 | We didn't do it in 2014 and now we regret it. ... | EvaMaydell | 1 | 7 | 26 | 0 | NaN | https://twitter.com/ZelenskyyUa/status/1496876... | NaN | (We, did, n't, do, it, in, 2014, and, now, we,... | MEPsPoland |
| 8138 | 2022-02-24 15:46:48+00:00 | 1496874256591994880 | Absolutely heartbreaking to think of the child... | EvaMaydell | 1 | 5 | 35 | 0 | NaN | NaN | NaN | (Absolutely, heartbreaking, to, think, of, the... | MEPsPoland |
| 8139 | 2022-02-24 11:51:00+00:00 | 1496814916149424134 | Ahead of #EUCO tonight: It’s no longer about w... | EvaMaydell | 4 | 4 | 13 | 0 | NaN | NaN | NaN | (Ahead, of, #, EUCO, tonight, :, It, ’s, no, l... | MEPsPoland |
| 8140 | 2022-02-24 09:16:05+00:00 | 1496775930827661312 | “One lie is enough to question all truths”\n\n... | EvaMaydell | 0 | 8 | 21 | 0 | NaN | https://twitter.com/jseldin/status/14967072364... | NaN | (“, One, lie, is, enough, to, question, all, t... | MEPsPoland |
| 8141 | 2022-02-24 08:54:14+00:00 | 1496770431637856257 | An independent, sovereign European country is ... | EvaMaydell | 0 | 7 | 15 | 0 | NaN | NaN | NaN | (An, independent, ,, sovereign, European, coun... | MEPsPoland |
135 rows × 13 columns
not_interesting = set(["\n", "\n\n", "🇺", "🇦", " ", "", '🇷', '👇', 'amp'])
lemmas_G = df_Germany.Text_en.apply(lambda doc: [token.lemma_ for token in doc if not token.is_stop if not token.is_punct if not token.lemma_ in not_interesting])
word_counts_G = Counter(lemmas_G.sum())
counts_G = pd.DataFrame(Counter({k: v for k, v in word_counts_G.items()}).most_common(60), columns=['word', 'count'])
cloud_from_lemmas(word_counts_G)
plot_counts(counts_G)
not_interesting = {'the', '@', 'a', 'this'}
lemmas_ngrams = df_Germany.Text_en.apply(lambda doc: list(doc.noun_chunks))
lemmas_ngrams = lemmas_ngrams.apply(lambda x: [''.join(str(el)) for el in x if len(el) == 2])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if len(x.split())==2])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if x.split()[0].lower() not in not_interesting if x.split()[1].lower() not in not_interesting])
word_counts_ngrams = dict(Counter(lemmas_ngrams.sum()).most_common(30))
cloud_from_lemmas(word_counts_ngrams)
counts_ngrams = pd.DataFrame(Counter({k: v for k, v in word_counts_ngrams.items()}).most_common(60), columns=['word', 'count'])
plot_counts(counts_ngrams)
not_interesting = {'the', '@', 'a', 'this'}
lemmas_ngrams = df_Germany.Text_en.apply(lambda doc: list(doc.noun_chunks))
lemmas_ngrams = lemmas_ngrams.apply(lambda x: [''.join(str(el)) for el in x if len(el) == 3])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if len(x.split())==3])
lemmas_ngrams = lemmas_ngrams.apply(lambda w: [x for x in w if x.split()[0].lower() not in not_interesting if x.split()[1].lower() not in not_interesting if x.split()[2].lower() not in not_interesting])
word_counts_ngrams = dict(Counter(lemmas_ngrams.sum()).most_common(30))
cloud_from_lemmas(word_counts_ngrams)
counts_ngrams = pd.DataFrame(Counter({k: v for k, v in word_counts_ngrams.items()}).most_common(60), columns=['word', 'count'])
plot_counts(counts_ngrams)#count tylko 1
top_15_likes = df_Germany.sort_values(by=['Likes Count'], ascending=False).head(10).Text
for l in top_15_likes: print(l+'\n\n')
Talking with @Google CEO @sundarpichai in Brussels today. In times of crisis, #digital platforms must use their powers for good. Google’s efforts to help #Ukraine are welcomed. We must continue to do more and to utilise the power of #AI and digital solutions. https://t.co/ElNLp9XX4g “Our task is to break the wall of lies - we need to mobilize out technological potential to win the war of truth. Global internet platforms have a huge role to play.” Another strong call to act against #Putin’s lies and disinformation by PM @kajakallas 👏🇺🇦 #StandWithUkraine https://t.co/S7511XE0vq “Prove that you are with us, prove you will not let us go, prove you are indeed European. Then life will win over death, and light will win over darkness. Glory to Ukraine.” This may be David versus Goliath, but it is @ZelenskyyUa who will be remembered as a giant of our time. https://t.co/JVG179drVA Delighted to be @EPPGroup rapporter on the Chips Act. No time to waste in building our digital future. The #ChipsAct must help make 🇪🇺 a global powerhouse in design & manufacturing. Thank you for your trust. Let’s build global leadership & make the #semiconductors of tomorrow. https://t.co/0RPDIoEg8w “Without European support, we cannot survive. Unity is our power and our only hope." @Vitaliy_Klychko at today’s @EPPGroup meeting. We will not let you down. #StandWithUkraine 🇺🇦 https://t.co/JsGBTOQhgR The #ChipsAct is Europe’s opportunity to react to the challenges of today and lead our economy towards the potential of tomorrow. Excellent meeting with @ThierryBreton on this and other topics. Thank you for your leadership in propelling forward 🇪🇺 industry. https://t.co/AVFESqSRBf “For some people, this day is not good. For some people, this is the last one. I speak now for Ukrainian citizens who are paying the ultimate price for defending freedom.” @ZelenskyyUa addressing #EPlenary now ⬇️ https://t.co/uKnumsqMCf Together with 200+ MEPs we call for the EU’s most severe #sanctions yet against #Russia: ! Oil gas & coal embargo ! Closure of ports to RU vessels & goods ! Disconnect all RU banks from SWIFT ! Extend sanctions to RU oligarchs, officials, civil servants on #Navalny list. https://t.co/ldTN4TlbPu He who controls the narrative controls the people. #Disinformation costs lives, wars & democracies. ❗️I have written to VP @VeraJourova to convene an urgent summit with social media platforms to tackle the spread of lies and harmful content by #Russia on this war. Read here: https://t.co/Wlo5dlRShj “While democracy in the long run is the most stable form of government, in the short run, it is among the most fragile” Thank you for showing us what it means to stand up for our democratic values. A trailblazer, a pioneer and an inspiration. Rest in peace #madelinealbright. https://t.co/WLBUjTIIe4